Initialization Methods for an EMG-based Silent Speech Recognizer
نویسنده
چکیده
The application of surface electromyography (EMG) to automatic speech recognition is a relatively new research field which has been developing rapidly in recent years. Previous works in this area were usually limited to distinguishing whole utterances, but a short time ago first systems to recognize continuous speech from EMG signals have been developed. To recognize continuous speech, one uses a phoneme-based recognizer; its initialization requires exact time-alignments of the training data, which can be generated by audio signals that are parallely recorded by a conventional microphone and then processed by a conventional speech recognizer. The main application of surface EMG in speech recognition is the recognition of silent speech. In this situation, audio-generated time-alignments are not readily available. Therefore, it is necessary to find another way to initialize a silent speech EMG recognizer. Most notably, due to differences in articulation between audible speech and silent speech, simply using a recognizer trained on EMG signals of audible speech to recognize silent speech is not the best option. This work deals with initializing an EMG-based recognizer for silent speech. I compare different methods to achieve this goal, including the manual generation of time-alignments, and evaluate their performance based on the results of the final silent speech recognition step on a large corpus of EMG recordings of silent speech. I find that a recognizer can best be initialized by "Cross-Modal Labeling", which involves computing time-alignments for the EMG recordings of silent speech and then training a full EMG recognizer for silent speech recordings. Compared to the baseline method of training a recognizer on audible EMG and testing it on silent EMG ("Cross-Modal Testing"), which gives a WER of 91.0%, Cross-Modal Labeling yields a WER of 77.5%, which is a significant relative improvement of 14.8%. Moreover, an optimization of this process applying an iterated computation of time-alignments gives a Word Error Rate of 71.01%, which compared to the original Cross-Modal Labeling approach is a relative improvement of 8.42%. These results are the best results obtained so far on the silent speech part of the EMG-PIT corpus achieved.
منابع مشابه
Impact of different speaking modes on EMG-based speech recognition
We present our recent results on speech recognition by surface electromyography (EMG), which captures the electric potentials that are generated by the human articulatory muscles. This technique can be used to enable Silent Speech Interfaces, since EMG signals are generated even when people only articulate speech without producing any sound. Preliminary experiments have shown that the EMG signa...
متن کاملArray-based Electromyographic Silent Speech Interface
An electromygraphic (EMG) Silent Speech Interface is a system which recognizes speech by capturing the electric potentials of the human articulatory muscles, thus enabling the user to communicate silently. This study is concerned with introducing an EMG recording system based on multi-channel electrode arrays. We first present our new system and introduce a method to deal with undertraining eff...
متن کاملDecision-tree based Analysis of Speaking Mode Discrepancies in EMG-based Speech Recognition
This study is concerned with the impact of speaking mode variabilities on speech recognition by surface electromyography (EMG). In EMG-based speech recognition, we capture the electric potentials of the human articulatory muscles by surface electrodes, so that the resulting signal can be used for speech processing. This enables the user to communicate silently, without uttering any sound. Previ...
متن کاملA Spectral Mapping Method for EMG-based Recognition of Silent Speech
This paper reports on our latest study on speech recognition based on surface electromyography (EMG). This technology allows for Silent Speech Interfaces since EMG captures the electrical potentials of the human articulatory muscles rather than the acoustic speech signal. Therefore, our technology enables speech recognition to be applied to silently mouthed speech. Earlier experiments indicate ...
متن کاملImpact of lack of acoustic feedback in EMG-based silent speech recognition
This paper presents our recent advances in speech recognition based on surface electromyography (EMG). This technology allows for Silent Speech Interfaces since EMG captures the electrical potentials of the human articulatory muscles rather than the acoustic speech signal. Our earlier experiments have shown that the EMG signal is greatly impacted by the mode of speaking. In this study we extend...
متن کامل